Personalized speech recognizer with keyword-based personalized lexicon and language model using word vector representations

نویسندگان

  • Ching-feng Yeh
  • Yuan-ming Liou
  • Hung-yi Lee
  • Lin-Shan Lee
چکیده

The popularity of mobile devices offers an ideal platform for personalized recognizers. With data collected from the user, the personalized recognizer with better matched acoustic and linguistic characteristics can offer not only better recognition accuracy but also less computational time. In this paper, we propose a scenario that a small data set (500 utterances with annotation) can be collected for each user and used to personalize the recognizer. Based on this scenario, we present an overall framework for accuracy improvement and computational time reduction. We train Gaussian Mixture Models (GMMs) based on the word vector representations [1][2] and develop word clusters and keyword extraction approaches for personalization of the lexicon and language model. Prototype recognition systems with CD-DNN-HMM [3][4][5] acoustic models adapted by fDLR [6][7][8][9] were implemented and tested for 10 target users. It was shown that the personalized lexicon may include much more user-specific words not obtained before, and significant performance improvement in terms of tradeoff relationships between recognition accuracy and real time factor was observed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بهبود کارایی سیستم کاوشگر کلمات تلفنی با استفاده از نرمالیزاسیون امتیاز اطمینان مبتنی بر روش برنامه‌ریزی خطی

Conventional word spotting systems determine hypothesized keywords and their confidence score using a speech recognizer. Acceptance or rejection of these keywords is intended based on comparison of their scores with a specific threshold. It has been proved that confidence score prepared by recognizer is highly dependent on sub-word structure of each keyword. So comparing assigned scores to keyw...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Speech-To-Text Conversion in French

Speech-to-text conversion of French necessitates that both the acoustic level recognition and language modeling be tailored to the French language. Work in this area was initiated at LIMSI over 10 years ago. In this paper a summary of the ongoing research in this direction is presented. Included are studies on distributional properties of French text materials; problems speciic to speech-to-tex...

متن کامل

Continuous Speech Recognition at LIMSI

This paper presents some of the recent research on speaker-independent continuous speech recognition at LIMSI including efforts in phone and word recognition for both French and English. Evaluation of an HMMbased phone recognizer on a subset of the BREF corpus, gives a phone accuracy of 67.1% with 35 context-independent phone models and 74.2% with 428 context-dependent phone models. The word ac...

متن کامل

Teaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval Application

In addition to the increasing number of publicly available multimedia documents generated and searched every day, there is also a large corpora of personalized videos, images and spoken recordings, stored on users’ private devices and/or in their personal accounts in the cloud. Retrieving spoken items via voice commonly involves supervised indexing approaches such as large vocabulary speech rec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015